AITopics

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Russia (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)

Neural Information Processing SystemsNov-15-2025, 22:39:08 GMT

A Minimum-fuel cost (MF) derivation When c(a(t)) = |a (t) |

The action cost differs by row, where the corresponding optimal policy parameterization is highlighted in green.

action cost, action penalty, gaussian policy, (14 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)

Griesbach, Sebastian, D'Eramo, Carlo

Learning to Explore in Diverse Reward Settings via Temporal-Difference-Error Maximization

arXiv.org Artificial IntelligenceOct-22-2025

Numerous heuristics and advanced approaches have been proposed for exploration in different settings for deep reinforcement learning. Noise-based exploration generally fares well with dense-shaped rewards and bonus-based exploration with sparse rewards. However, these methods usually require additional tuning to deal with undesirable reward settings by adjusting hyperparameters and noise distributions. Rewards that actively discourage exploration, i.e., with an action cost and no other dense signal to follow, can pose a major challenge. We propose a novel exploration method, Stable Error-seeking Exploration (SEE), that is robust across dense, sparse, and exploration-adverse reward settings. To this endeavor, we revisit the idea of maximizing the TD-error as a separate objective. Our method introduces three design choices to mitigate instability caused by far-off-policy learning, the conflict of interest of maximizing the cumulative TD-error in an episodic setting, and the non-stationary nature of TD-errors. SEE can be combined with off-policy algorithms without modifying the optimization pipeline of the original objective. In our experimental analysis, we show that a Soft-Actor Critic agent with the addition of SEE performs robustly across three diverse reward settings in a variety of tasks without hyperparameter adjustments.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2506.13345

Country:

Europe > Germany (0.68)
North America > United States > California (0.28)
Europe > United Kingdom > England (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Pozanco, Alberto, Morales, Marianela, Borrajo, Daniel, Veloso, Manuela

Planning with Minimal Disruption

arXiv.org Artificial IntelligenceAug-22-2025

In many planning applications, we might be interested in finding plans that minimally modify the initial state to achieve the goals. We refer to this concept as plan disruption. In this paper, we formally introduce it, and define various planning-based compilations that aim to jointly optimize both the sum of action costs and plan disruption. Experimental results in different benchmarks show that the reformulated task can be effectively solved in practice to generate plans that balance both objectives.

artificial intelligence, plan disruption, planning & scheduling, (15 more...)

2508.15358

Genre: Research Report (0.50)

Industry: Banking & Finance (0.46)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)

Neural Information Processing SystemsAug-18-2025, 06:11:56 GMT

A Minimum-fuel cost (MF) derivation When c(a(t)) = |a (t) |

The action cost differs by row, where the corresponding optimal policy parameterization is highlighted in green.

action cost, artificial intelligence, machine learning, (17 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)

Neural Information Processing SystemsAug-18-2025, 06:11:53 GMT

e46be61f0050f9cc3a98d5d2192cb0eb-Paper.pdf

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Los Angeles County > Santa Monica (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.73)

arXiv.org Artificial IntelligenceJul-22-2025

Interleaved LLM and Motion Planning for Generalized Multi-Object Collection in Large Scene Graphs

Yang, Ruochu, Zhou, Yu, Zhang, Fumin, Hou, Mengxue

Household robots have been a longstanding research topic, but they still lack human-like intelligence, particularly in manipulating open-set objects and navigating large environments efficiently and accurately. To push this boundary, we consider a generalized multi-object collection problem in large scene graphs, where the robot needs to pick up and place multiple objects across multiple locations in a long mission of multiple human commands. This problem is extremely challenging since it requires long-horizon planning in a vast action-state space under high uncertainties. To this end, we propose a novel interleaved LLM and motion planning algorithm Inter-LLM. By designing a multimodal action cost similarity function, our algorithm can both reflect the history and look into the future to optimize plans, striking a good balance of quality and efficiency. Simulation experiments demonstrate that compared with latest works, our algorithm improves the overall mission performance by 30% in terms of fulfilling human commands, maximizing mission success rates, and minimizing mission costs.

artificial intelligence, large language model, natural language, (18 more...)

2507.15782

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Alnazer, Ebaa, Georgievski, Ilche, Aiello, Marco

Risk Awareness in HTN Planning

arXiv.org Artificial IntelligenceJun-5-2025

Actual real-world domains are characterised by uncertain situations in which acting and using resources may entail the embracing of risks. Performing actions in such domains involves costs of consuming some resource, such as time or energy, where the knowledge about these costs can range from known to totally unknown. In autonomous vehicles, actions have uncertain costs due to factors like traffic. Choosing an action requires assessing delay risks, as each road may have unpredictable congestion. Thus, these domains call for not only planning under uncertainty but also planning while embracing risk. Resorting to HTN planning as a widely used planning technique in real-world applications, one can observe that existing approaches assume risk neutrality, relying on single-valued action costs without considering risk. Here, we enhance HTN planning with risk awareness by considering expected utility theory. We introduce a general framework for HTN planning that allows modelling risk and uncertainty using a probability distribution of action costs upon which we define risk-aware HTN planning as being capable of accounting for the different risk attitudes and allowing the computation of plans that go beyond risk neutrality. We lay out that computing risk-aware plans requires finding plans with the highest expected utility. We argue that it is possible for HTN planning agents to solve specialised risk-aware HTN planning problems by adapting existing HTN planning approaches, and develop an approach that surpasses the expressiveness of current approaches by allowing these agents to compute plans tailored to a particular risk attitude. An empirical evaluation of two case studies highlights the feasibility and expressiveness of this approach. We also highlight open issues, such as applying the proposal beyond HTN planning, covering both modelling and plan generation.

artificial intelligence, attitude, planning & scheduling, (18 more...)

2204.10669

Country: Europe (0.45)

Genre:

Research Report (1.00)
Workflow (0.92)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (1.00)
Automobiles & Trucks (1.00)
Leisure & Entertainment (0.67)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)

Akins, Sapphira, Mertens, Hans, Zhu, Frances

Cost-Aware Query Policies in Active Learning for Efficient Autonomous Robotic Exploration

arXiv.org Artificial IntelligenceOct-31-2024

In missions constrained by finite resources, efficient data collection is critical. Informative path planning, driven by automated decision-making, optimizes exploration by reducing the costs associated with accurate characterization of a target in an environment. Previous implementations of active learning did not consider the action cost for regression problems or only considered the action cost for classification problems. This paper analyzes an AL algorithm for Gaussian Process regression while incorporating action cost. The algorithm's performance is compared on various regression problems to include terrain mapping on diverse simulated surfaces along metrics of root mean square error, samples and distance until convergence, and model variance upon convergence. The cost-dependent acquisition policy doesn't organically optimize information gain over distance. Instead, the traditional uncertainty metric with a distance constraint best minimizes root-mean-square error over trajectory distance. This studys impact is to provide insight into incorporating action cost with AL methods to optimize exploration under realistic mission constraints.

artificial intelligence, convergence, machine learning, (18 more...)

2411.00137

Country:

North America > United States > Hawaii > Honolulu County > Honolulu (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > France (0.04)

Genre: Research Report (1.00)

Industry: Energy (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.54)

arXiv.org Artificial IntelligenceAug-26-2024

Decision-Focused Learning to Predict Action Costs for Planning

Mandi, Jayanta, Foschini, Marco, Holler, Daniel, Thiebaux, Sylvie, Hoffmann, Jorg, Guns, Tias

In many automated planning applications, action costs can be hard to specify. An example is the time needed to travel through a certain road segment, which depends on many factors, such as the current weather conditions. A natural way to address this issue is to learn to predict these parameters based on input features (e.g., weather forecasts) and use the predicted action costs in automated planning afterward. Decision-Focused Learning (DFL) has been successful in learning to predict the parameters of combinatorial optimization problems in a way that optimizes solution quality rather than prediction quality. This approach yields better results than treating prediction and optimization as separate tasks. In this paper, we investigate for the first time the challenges of implementing DFL for automated planning in order to learn to predict the action costs. There are two main challenges to overcome: (1) planning systems are called during gradient descent learning, to solve planning problems with negative action costs, which are not supported in planning. We propose novel methods for gradient computation to avoid this issue. (2) DFL requires repeated planner calls during training, which can limit the scalability of the method. We experiment with different methods approximating the optimal plan as well as an easy-to-implement caching mechanism to speed up the learning process. As the first work that addresses DFL for automated planning, we demonstrate that the proposed gradient computation consistently yields significantly better plans than predictions aimed at minimizing prediction error; and that caching can temper the computation requirements.

action cost, planning problem, prediction, (15 more...)

2408.06876

Country:

Europe > Germany > Saarland (0.05)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
Oceania > Australia (0.04)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)

Genre: Research Report > Promising Solution (0.48)

Industry: Transportation (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)